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INTRODUCTION 



In research it frequently happens that K populations (K >2) are to be 
tested for homogeneity of variance. Tests of homogeneity of variance are 
appropriate in two general situations. First, these tests are used when the 
experimenter has an a priori interest in testing the equality of variances 
for K independent groups. Second, they are used in testing the assumption 
of homogeneity of variance needed to guarantee the accuracy of certain tests 
on means. 

Various theories in education and psychology have generated a priori 
hypotheses about equality of variances. For example, if high level skills 
build upon lower level skills as in* Gagne's (1965) hierarchically arranged 
behaviors, small variance in the low level skills would facilitate teaching 
of higher skills. One of the major pieces of evidence for Cattell f s two 
factor theory of general intelligence is the difference in test score 
variance between tests of fluid and tests of crystallized abilities 
(Cattell, 1971). In classical mental test theory (Lord and Novick, 1968) 
parallel forms of tests are required to have equal variances. 

The use of tests of homogeneity of variance to guarantee the accuracy of 
tests on means is considered by many to be unnecessary because the analysis 
of variance is generally robust to violations of this assumption. Box (1954) 
and Norton (Lindquist, 1953 > p. 78) have shown that this assumption is 
critical to the analysis of variance when n's are small or unequal. That 
heterogeneous variances are not important when n f s are equal seems to have 
boundary conditions which may not have been sufficiently probed (Glass, 



Peckham and Sanders, 1972) • Investigations of the t statistic with unequal 
n f s by Kohr (1970) and of three multiple comparison procedures, Multiple t 
test (Fisher, 1935), the Tukey Wholly Significant Difference (Tukey, 1953; 
Miller, 1966) and the Scheffe (Scheffe, 1953), by Howell (1971) indicate 
that the assumption of homogeneity of variance is critical to these pro- 
cedures also. 

Typically, independent random samples are compared via some function of 
the sample* variances with a known sampling distribution when assumptions are 
met, Bartlett and Kendall (1946) suggested randomly dividing each of the K 
samples into subsamples and computing a variance estimate on each subsample. 
An analysis of variance (AOV) is then conducted on the variable In s to 
test the equality of the K variances. The additive model of the AOV is met 
by using In as the dependent variable but not by using simply s^. This 
test is described in Scheffe (1959, p. 83), Odeh and Olds (1959) and Winer 
(1971, p. 219). Games, Winkler and Prober t (1972, p. 904) provided a des- 
cription and an example including follow-up comparisons. In the present study 
stability of Type I error rates, power and a procedure for estimating power 
were investigated for the Bartlett and Kendall test with various subsample 
sizes when the assumption of normality was met or was violated. 

TESTING EQUALITY OF VARIANCES 

In addition to the Bartlett and Kendall test a wide selection of tests 
of homogeneity of variance is available in the statistical literature. The 
first approach to the variance testing problem was made by Neyman and 
Pearson (1931) using a likelihood-ratio statistic, approximately distributed 
as chi-square (K - 1 degrees of freedom). The familiar Bartlett M test is 
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a modification of this statistic improving the approximation to chi-square 

(Bartlett, 1937). A modification of the Bartlett test which adjusts M to 

compensate for the population kurtosis was proposed by Box and Andersen (1955) . 

A statistic designed for the situation where just one of several populations 

is suspected of having a larger variance was introduced by Cochran (1941, 

1951). The statistic is based on the ratio of the largest sample variance 

9 2 

to the sum of the sample variances: C = s z /Zs v (Myers, 1966, p. 73; Winer, 

max K - 

1971, p. 208). Hartley (1950) derived a statistic, F max, comparing the 
largest and smallest of the sample variances to reduce the computational 
effort typical of tests on variances (Myers, 1966, p. 73; Winer, 1971, p. 206). 
Cadwell (1952) suggested an extension of the Hartley technique in which 
variance estimates based on sums of squares are replaced by estimates based 
on ranges. Another test based on the analysis of variance of transformed 
observations was suggested by Levene (1960). Levene proposed two transfor- 
mations : 

(1) z.. = | Xij - X -j| 

(2) = Z*, = (IX.. - xj) 2 . 

Miller (1968) proposed doing an AOV on Z! . - |X.. - m . | where m. is the 
median of the j th sample. The Foster and Burr Q test (Foster, 1964) is 
based on a monotone function of the coefficient of variation of the sample 
variances. For equal n, Q = Zs£/(Zs 2 ) 2 . Miller (1968) applied the Tukey 
jackknife technique (Mosteller and Tukey, 1968) to the variance testing 
problem. Layard (1973) suggested a x statistic useful with large samples. 
Various nonparanetric tests have been developed and are discussed in Klotz 
(1962). Only one, the Moses test (Moses, 1963) has been used in recent 
comparisons with parametric tests. 
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An imposing problem with tests of homogeneity of variance and one of 

the main reasons for the proliferation of tests in this area is the general 

sensitivity (nonrobustness) of these statistics to violations of the 

normality assumption. Although the assumption is trivial to many tests on 

means it is crucial to tests on variances. The difference lies in the 

standard errors of the two statistics used when inferences are being made. 

For tests on means the standard error of X - O- = O //n regardless of the 

X 

2 

population form but with variances the standard error of s = 0 2 = 
2 fl 

C A— • + — _~ where Y 0 is the index of kurtosis for the population (Johnson 
XV 11 " 1 n 2 

and Jackson, 1959). Normal distributions have a value of 0.0. In 
making inferences about s the assumption of normality is necessary not 
only to show that the sampling distribution has a chi-square form but also to 
fix the magnitude of the sampling fluctuation. If the population is platy- 
kurtic (-2 j< 0.0) the value of Y2 used in theoretical derivation 
will be larger than the true value and a conservative test will result. 
Conversely, with a leptokurtic population (0.0 < Y 2 1 + 00 ) the true value 
will be larger than the theoretical value, raising the probability of a 
Type I error, P(EI) above alpha. 

Scheffe (1959, p. 337) concluded that violations of the normality 
assumption produce dangerous effects on inferences about variances because 
although the theoretical distribution may have the correct location and 
shape, at least for large n, it may have the wrong spread if the true Y2 
differs from zero. Box (1953) suggested that robustness is the most 
important characteristic of a statistic even to the extent of sacrificing 
power to ensure control of Type I error. 
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The sensitivity to nonnorraality of the F test of two independent 
sample variances was pointed out by Pearson (1931) , Geary (1947) and Gayen 
(1950). Box (1953) showed that this sensitivity is even greater when the 
number of variances exceeds two. Box showed that Bartlett f s M is asymptotic- 
ally distributed as (1 + .5*y ) \ 2 • Bottl the F max and tlie Cochran tests 

2 K~l 

were shown to be affected by kurtosis in much the same manner as the Bartlett 
test. Box carried out a small sampling study comparing the Bartlett test 
with the Bartlett and Kendall test for the case where the assumption of 
normality was violated. The results for the Bartlett test showed extremely 
large departures from the values expected from normal theory. In contrast, 
the results for the Bartlett and Kendall test gave values agreeing with what 
would be expected assuming that the logarithms of the variances were drawn 
from a normal distribution. 

Levene (1960) compared the empirical sampling distributions of F ratios 
calculated on the Z^'s with the F distribution. Empirical probabilities 
were significantly different from the nominal alpha when sampling was from 
a double exponential distribution. Miller (1968) suggested that this 
condition will not necessarily improve as n increases since Z is not 
asymptotically distribution free. Brown and Forsythe (1974) found better 
agreement with the nominal alpha when deviations were taken from the median 
rather than the mean. 

Using a monte carlo design. Miller (1968) examined the robustness of the 
F, Box-Andersen M 1 , Levene S, Bartlett and Kendall (referred to as the Box 
test), Moses and the Tukey jackknife test for the two group case with small 
samples. Five distributions were used: (1) uniform, (2) normal, (3) 
double exponential, (4) skew double exponential, and (5) sixth power. 
The F test was found to be extremely nonrobust. Somewhat less sensitive 
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but still of questionable robustness were the Box-Andersen and jackknife 
tests. The Levene S, Bartlett and Kendall, and Moses tests were generally 
robust with empirical significance levels close to the nominally indicated 
levels. In terms of robustness the F, Box-Andersen and jackknife tests 
do not appear to be acceptable as tests of homogeneity of variance. 

In a monte carlo study specifically designed to examine the robustness 
of tests on variances. Fellers (1972) compared the Hartley approximation to 
the Bartlett test (Hartley, 1940), the Cochran test, the F max test, the 
Bartlett and Kenda 1 test (referred to as the Scheffe test), and three forms 
of the Levene test. Populations considered were normal, leptokur tic- 
symmetric, platykurtic-syrametric, leptokurtic-sk^wed and platykurtic- 
skewed. Equal and unequal n cases were considered for three treatment groups 
with total N set at 15. Fellers data showed the Bartlett test to be 
conservative for the platykurtic populations and extremely permissive for 
the leptokurtic populations. The Cochran and F max test were nonrobust for 
all but the near normal populations with the leptokurtic populations 
producing the most extreme effects. Of the three Levene tests, the S and Z 
lacked sufficient robustness to be considered as acceptable tests of homogene- 
ity of variance. The third, Z 1 , based on the absolute deviations from the 
median produced results so extreme as to render them uninterpre table. 
Only the Bartlett and Kendall test proved to be generally robust for all 
combinations of nonnormality and sample size. 

Gartside (1972) investigated the stability of error rates when populations 
had normal or Weibull (leptokurtic) distributions. In addition to the 
Bartlett, Cochran, F max, and Bartlett and Kendall tests Gartside included 
modifications of the Bartlett and Bartlett and Kendall tests and added the 
Cadwell test. Only the Bartlett and Kendall test maintained stable error 
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rates for the leptokurtic population* As the number of populations being 
compared was increased the deviations from the nominal alpha also increased. 
The latter supports Box (1953) who derived this conclusion mathematically. 

The question of robustness was considered in two monte carlo studies 
reported by Games, Winkler and Probert (1972). The first study compared 
the F max, Cochran, two Levene (Z and S) , and the Bartlett tests. In the 
second study the Bartlett, Box-Andersen, Bartlett and Kendall, and Foster 
and Burr Q tests were compared. Samples were drawn from six populations: 
(1) normal, (2) slight skew, (3) moderate skew, (4) extreme skew, (5) 
symmetric leptokurtic and (6) rectangular. Tests were conducted on three 
samples of size 6 in the first study and of size 18 in the second. With 
n=18 two forms of the Bartlett and Kendall test were used. The first used 
nine subsamples of two cases each (LEV 2) . The second used six subsamples 
of size three (LEV 3). The results indicate that the Bartlett, F max, 
Cochran, Levene Z and S, Foster-Burr, and Box-Andersen tests are extremely 
sensitive to the shape of the underlying distribution. LEV 2 was conservative 
for most distributions and not really sensitive to distribution form. 
Although LEV 3 was slightly conservative it was most robust. Games et al. 
concluded that on the basis of control of Type I error alone the Bartlett 
and Kendall test is recommended for all situations. 

Only the Moses test and the Bartlett and Kendall test have been shown 
to be robust with respect to control of Type I error under the various 
conditions of nonnormality which have been investigated. These would be the 
recommended tests when there is suspected nonnormality in the data. 
Unfortunately power considerations will lead to a different recommendation. 

Pearson (1966) employed a monte carlo design to compare the power of 
the Bartlett, F max, and Cadwell tests for five groups of five observations 
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each. For all situations the Bartlett test exhibited the greatest power* 
Pearson recommended the Bartlett test but noted that none of the tests were 
robust to violations °f the assumption of normality. 

In a monte carlo study of the two group case by Miller (1968) power 
for the F, Box-Andersen, Bartlett and Kendall, jackknife, Levene S and 
Moses tests was investigated. The most powerful test examined, the F test, 
was dismissed by Miller because of sensitivity to nonnormality. The Box- 
Andersen and the jackknife had approximately the same power and were the most 
powerful of the other tests. Slightly less powerful were the Bartlett and 
Kendall test and a form of the jackknife test. The Levene z was not as 
powerful as those above. Least powerful of all the tests was the Moses test. 
Miller suggested using the jackknife of the Box-Andersen as a general 
technique for testing variances. If there is possible leptokurtosis in 
the population an experimenter would be sacrificing control of Type I error 
for power by following this suggestion. 

Gartside (1972) investigated power for the Bartl *.t, F max, Cochran, 
Cadwell and Bartlett and Kendall tests for the case where the assumption 
of normality was met. The results showed the Bartlett test to be generally 
most powerful. The Cochran test was most powerful when only one of a set 
of variances was different. The F max and Cadwell tests showed fairly good 
power in all cases as did a modification of the Bartlett test. The 
Bartlett and Kendall test showed lower power. Gartside noted that the 
success of the Bartlett test must be tempered by its unstable error 
rates. He concluded that the more conservative Bartlett and Kendall test 
is preferred if there is reason to believe that the data are nonnormal. 

In the two sampling studies conducted by Games, Winkler and Probert 
(1972) power of the selected tests on variances was investigated. Results 



for the normal and extreme skew populations were similar with the Bartlett 
and F max tests showing nearly identical power. The Cochran and both Levene 
tests exhibited much lower power with the Levene S being the lowest of all. 
In the second study the Bartlett and Foster-Burr Q tests consistently 
showed the greatest power. The Bartlett and Kendall tests showed lower 
power for all populations with LEV 3 considerably more powerful than LEV /-« 
With the normal population the Box-Andersen test exhibited power near the 
Bartlett test but with the extreme skew population itr, power decreased 
to the level of LEV 3. Games et al. concluded that the Bartlett, Foster- 
Burr and F max tests are most powerful but are relatively useless tor 
leptokurtic populations because of inflated P(EI)'s. The LEV 3 had power 
superior to the Box-Andersen test on the highly leptokurtic populations 
and stands as the best statistic for this population condition. 

Games at al. (1972) noted that the biggest question in the application 
of the bartlett and Kendall test is: given the number of treatments, K, 
and samples of n observations each, how many subsamples, m, should be 
used? If fewer subsamples are used the variance estimates become mora 
stable producing a smaller mean square error for the AOV which increases 
power. But the degrees of freedom for the error term is also decreased when 
m is lowered which reduces power. There should be an optimal value for m 
which balances these two determinants of power. 

Box (1953) first suggested an investigation to determine this optimal 
value of m. Miller (1968) ignored the question, claiming that the choice of 
subsample size rests on the shoulders of the statistician. A similar point 
of view was taken by Winer (1971) who stated that the number of subsamples is 
arbitrary. He did suggest using subsamples of approximately equal size, 
preferably of size larger than three. To assure reasonable power W?.ner 



10 



recommended having the total number of subsamples minus the number of 
treatment groups, i.e., the degrees of freedom of the AOV error term, 
at least equal to ten. 

Three subsample sizes were compared for power by Gartside (1972) . 
With n^l6, subsamples of size two, four and eight were used under several 
conditions of variance heterogeneity with samples drawn from a normal 
population. Using the intermediate arrangement, i.e., four samples of 
size four produced the maximum power • 

With n=18, Games et al. compared sampling arrangements of six sub- 
samples of size three with nine subsamples of size two. Both arrangements 
provided acceptable control of Type I error although the latter was 
consistently conservative. Subsamples of size three produced higher power 
than subsamples of size two for all populations at all points where the 
null hypothesis was false. 

Games et al* suggested using the power functions of the analysis of 
variance to select sample and subsample sizes for the Bartlett and Kendall 
test. By setting K, n and the degree of falsity of the null hypothesis 
approximate power could be found for different subsample sizes. Tables 
of the power functions of the analysis of variance are readily available 
(Myers, 1966; Winer, 1971). Setting K = 3 and K - 5, all n f s from 12 to 
36 that had any two of the numbers 3, 4, 5 and 6 as factors were explored, 
The noncentrality parameter of these tables, <{>, (Myers, 1966, p. 77) is 
discussed later in this paper. For a highly leptokurtic population, 
Y = 6.0. the results suggested subsamples of size three would result in 
maximum power up to n = 18 with little loss in power up to n = 36. 
Setting Y 2 = a normal population, suggested subsamples of size three 

up to n « 13 but of size four from n - 18 to n - 36. Asymptotic power 
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theory (Miller, 1968) suggests increasing subsample size for very large n. 
No empirical test has been made of this method for selecting subsample size. 

THE BARTLETT AND KENDALL STATISTIC 

In investigations of variance heterogeneity the Bartlett and Kendall 
test is appropriate for a one-way or higher analysis of variance layout. 
For simplicity of presentation only the test for a single factor, indepen- 
dent groups design is discussed. 

The Bartlett and Kendall test under assumptions appropriate to the 
analysis of variance, compares K samples of n (i " 1, 2» • • • , K) observa- 
tions each. Observations are randomized within treatment groups and divided 
into m. subsamples of v--^ (j » 1, 2, m-0 observations. On each subsam- 

pie an estimate of the treatment variance, s?j, is computed. An AOV is 
then conducted using Y^j » In s^ as observations. If all v^ are equal 
the test statistic is: 

[Z. m,(Y~ - Y~) 2 ] Z (m. - 1) 

[l.l. (y.. - y7") 2 ] (k - D 

X J XJ x. 

where 

Y = Z. m.Y. /£. m. . 
i x x. i x 

A weighted least squares solution was provided for the situation where 
subsample sizes (v^) are not all equal by Scheffe (1959, p. 85). The test 
statistic for unequal v is: 
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where 



[l ± S^CX^ - X # > ) 2 ] S ± (mj - 1) 
[Z^ UyCYy - Xj~) 2 ] (K - 1) 



U ij = V ij ' 1 

"17 = z j u ij 

u = Z u = E E u 

u .. i u i. i j ij 



x i. = h u ij Y ij /u i. 

These two statistics can be compared to the F distribution with K - 1 
and (m.^ - 1) degrees of freedom. 

The power functions of the analysis of variance appear apropriate for 
estimating power and for selecting n and v (all equal) for a given 
power for the Bartlett and Kendall test. A priori estimates of power can 
be made using the Pearson and Hartley (1951, p. 112) charts of the power 

functions for the analysis of variance. Scheffe (1959, p. 85) showed that 

2 ^2 2 

a 2 0 ^ + which is the E(MS TT ) in an AOV on In . Let $ be 

In v - 1 ■ v w 

the noncentrality parameter of these charts (Myers, 1966, p. 77). Given 
K sets of m independent observations each, each observation being the 
logarithmic transformation of a sample variance of v observations 




m9 / m9 



a2 ln s2 / 2 



v - 1 



where 



8 = S ± (In o\ - In a 2 ) 2 / K 

15 



In a 2 « E. In a 2 / K • 
i i 
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METHOD 

Because of the intractability of using direct mathematical analysis a 
monte carlo design was employed. For each combination of conditions inves- 
tigated a simulated analysis was repeated 1,000 times in four blocks of 
250 times each. The number and proportion of rejections of the null hypo- 
thesis at the one and five percent levels were recorded for each of the 
four blocks. The frequencies of rejection became dependent measures in 
analyses of variance of the conditions investigated. 

In this study only the three independent group, equal n case for the 
Bartlett and Kendall test was investigated. Sample sizes, representing 
small, intermediate and large sample situations with n f s of 18, 36 and 48 
respectively were selected. For each n a set of six v f s was investigated: 
n - 18, v » 2, 3, 4, 5, 6, 7; n = 36, v - 3, 4, 5, 6, 7, 8; n - 48, v - 5, 
6, 7, 8, 9, 10. It was predicted that from each set one v would be found 
which would produce maximum power for the given sample size. Where n/v 
was not an integer v is the minimum subsample size. In this case the last 
subsample formed consisted of v plus any remaining elements (always less 
than v) . 

Control of Type I error was investigated for the case of equal treat- 
ment variances. To investigate power three conditions of variance hetero- 
geneity were used. Variances were formed by multiplying each element in 
a treatment group by a constant representing the desired standard deviation 
for that group. Variances were chosen for each sample size which would 
provide diverse power points over the complete power range. Table 1 
presents the constants selected and 9, a measure of the degree of variability 
q in the set of variances. 

ERIC 16 
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The simulations consisted of a series of experiments, each conducted 

with three samples, representing three treatment groups, of n observations 

each. Sampling was random with replacement from two populations of 10,000 

cases. A normally distributed population and a population with extreme 

2 

skewness and leptokurtosis (x with 2 df) were used for the situations 

where the normality assumption was met and violated respectively. The normally 

distributed population was constructed by dividing it into 28 intervals of 

0.3 standard deviation units ranging from -4.2 to 4.2 standard deviations 

with known expected proportions of cases for each interval. Uniform 

values of half the interval width were randomly generated and added to or 

subtracted from the interval midpoint to forrii normal deviates. The lepto- 

kurtic distribution was constructed by forming cases of the sum of two 

squared normal deviates. A procedure by Chen (1971) (made available 

after the study using a normal population had been conducted) was used to 

2 

generate these deviates. Parameters (y, O , Yj^ Y 2 ) were computed for each 
population. The parameters were y = 0.0003, O 2 = 1.0123, Y x = -0.0010, 
Y 2 2=5 -0.0166 and y - 1.9701, O 2 = 4.0290, Yi = 2.0275, Y 2 53 6.0105 for the 
normal and leptokurtic populations respectively. 

A priori estimates of power using <J> (Myers, 1966, p. 77) were compared 
with empirical results to examine the accuracy of this approximation. Phi 
was computed as if v's in a given sample were actually equal. Values of 
<{> were calculated and approximate powers were taken from power curves in 
Myers (1966, p. 390). Empirical powers were the proportions of rejection 
of a false null hypothesis. 
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RESULTS 



Analysis of Type I errors was conducted in terms of proportions of 
rejection (p) of a true null hypothesis. Of 72 sample p's only three 
differed significantly at the .Ob level from the nominal error rate. 
Only one of these significant p's occurred with the leptokurtic population. 
Table 2 presents sample p's for both populations at the one and five 
percent levels. These p's are proportions averaged over the four blocks 
of trials. The Bartlett and Kendall test was shown to be robust when 
populations are extremely leptokurtic for all combinations of n and v 
investigated. 

The focus of the present study was on selecting subsample sizes which 
would produce maximum power for a given n. To compare power produced by 
the different v's, analyses of variance were conducted for the conditions 
investigated. The dependent measures were the frequencies of rejection 
for the four blocks of trials. Subsample size, population form, and 9 
were the factors considered. The three main effects and the population form 
by subsample size interaction were found to be significant (a = .05). 
The 8 effect was expected and is trivial. It is simply a measure of the 
deviation from the null hypothesis. A population form effect was expected 
and indicates a lower power for the leptokurtic population. 

The interaction of population form and subsample size represents the 
major complication of the study. It indicates that a single v may not 
produce maximum power for both population forms. Had the desired results 
occurred there would have been only a main subsample size effect and no 
interaction. Because of the interaction the subsample size effect was 
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investigated for each population form separately. Table 3 presents the 
mean frequencies of rejections (averaged over the three 8 conditions and 
four blocks of trials) for each v under each population form at the one 
and five percent levels. The v f s are ranked in terms of power produced. 
Multiple comparisons via the Newman-Keuls technique indicated that in 
most cases no single v was optimal (produced the greatest power) in any 
given set. When the mean power was significantly less (a ■ .05) than the 
maximum in the column it is marked with an *. 

Considering both nominal error rates for n = 18, optimal v f s were v » 4 
for the normal population and v = 3 for the leptokurtic population. At 
n = 36 with the normal population, no significant differences were found 
between v's of six and seven. The leptokurtic results provided no clear 
way to choose between v = 4, v ■ 5, or v = 6. Thus with n = 36, one could 
generally use v - 6 with relatively little danger of appreciable power 
loss. When n = 48 the results were clearer. For the normal population 
v = 8 was most powerful and for the leptokurtic population v - 6 and v = 5 
produced approximately equal maximum power. 

A priori estimates of power based on <{> (Myers, 1966, p. 77) were compared 
with power empirically produced in the sampling study. A moderate degree 
of correspondence was found between the two. Peaison r f s were computed 
for each combination of n, population form and nominal error rate. The 
minimum r computed was .86 and the maximum was .98 over the twelve sets 
of data. A large part of this relationship was due to the differences in 
0. As an example Table 4 presents empirical and a priori powers for n « 48 
and a ■ .05. The AOV power functions appear useful in providing rough 
estimates of power for the Bartlett and Kendall test. 
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DISCUSSION 



The robustness of the Bartlett and Kendall test was supported . Even 
„hen the population fro. which sables are drawn Is extremely leptokurtic 
the Bartlett and Kendall test provides control of P(EI) at the nominal 
alpha level. Control Is unaffected by sample and subsample sizes. This 
test ray be recorded for general use even when populations have sus- 

pected leptokurtosis. 

Several researchers (Box, 1953; Games et al., 1972; Gartslde, 1972; 
Millar, 1968; Scheffe, 1959) have pointed out the Inch of knowledge for 
choosing v in an analysis. This study suggests two rules of thumb for 
.electing a most powerful v for a fixed n. Find a value near X. rounding 
nonlnteger values higher If the subject population Is normal or lower If 
leptokurtosis Is suspected. A second suggestion Is to choose values at 
thia point which are even divisors of n. The power differences between 
adjacent values of v are sufficiently small so that there usually would be 
little Power loss If the value used differs by only one from the optimum 
value of v. These rules are In agreement with Gartslde (1972) who noted 
that an Intermediate arrangement would appear to give more power. 

The given approximation to » and tha power functions of the analysis 
of variance are shown to be generally accurate for selecting power for the 
Bartlett and Kendall test. Using the above rules of thumb for selection of 
v, an experimenter should be able to approximate the power for any n chosen. 

t i~~+r>A in Hie ueeested manner when n s 

A problem arises if v«s are selected in the ugge 

are unequal. In this situation the Vs will also be unequal. The treatment 
group with the largest n and thus largest v will have the most stable 
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variance estimates. Hence this treatment group will have a smaller within 
cell variance ir, the AOV. Conversely, the group with the smallest n will 
have a larger within cell variance. Since m = n/v the sample sizes for 
the AOV will also be unequal. This combination of heterogeneous within 
cell variances and unequal n f s may present problems for the AOV. 
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TABLE 1 
Constants and 0's 



?? 



Sample 
Size 


Variance 
Condi txon 




Group 1 




vJLvUl' -J 


18 


1 


0.0000 


1 


1 


1 


18 


2 


0.0806 


1 


2 


2 


1 ft 


o 




1 


2 


3 


18 


4 


0.2417 


1 


2 


4 


36 


1 


0.0000 


1 


1 


1 


36 


2 


0.0388 


1 


ft 




36 


3 


0.0604 


1 


ft 


2 


36 


4 


0.0806 


1 


2 


2 


48 


1 


0.0000 


1 


1 


1 


48 


2 


0.0201 


1 


^2 


fi 


48 


3 


0.0388 


1 




S3 


48 


4 


0.0604 


1 




2 


* Theta's 


reported in the present 


study are based on common 


logarithms. 
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TABLE 2 

Proportion of Significant Results 
When H Q is True 



Skewed-Leptokurtic Normal 
Population ' Population 
a=.01 a=.05 a=.01 a=.05 



18 2 .010 .051 .006 .034* 

18 3 .014 .058 .010 .041 

18 4 .008 .051 .015 .052 

18 5 .008 .065* .006 .044 

18 6 .006 .049 .007 .051 

18 7 .009 .050 .012 .057 

36 3 .013 .049 .004 .041 

36 4 .013 .052 .009 .045 

36 5 .006 .061 .008 .042 

36 6 .007 .039 .005 .030* 

36 7 .011 .053 .009 .047 

36 8 .008 .052 .010 .045 

48 5 .013 .044 .013 .051 

48 6 .011 .044 .014 .054 

48 7 .005 .040 .013 .049 

48 8 .006 .048 .010 .056 

48 9 .012 .053 -009 .047 

48 10 .009 .040 .010 .059 



* Represents significant deviation from a. 
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TABLE 3 

Mean Frequency of Rejection for 
Population Form X Subsample 
Size Interaction 



Normal Skewed-Leptokurtic 





Population 








Population 












n = 18 










V 


a=.01 


V 


a=. 05 




V 


a=.01 


V 


a— . vd 


4 


111.25 


6 


181.67 




3 


52.17 


3 


lift OO 

JLJLU . o3 


3 


104.33* 


4 


177.67 




4 


46.92* 


4 


108.25 


6 


99.58* 


5 


176.83 




2 


41.50* 


6 


102.92* 


5 


92.25* 


3 


167.83* 




6 


37.50* 


5 


98.08* 


2 


62.75* 


7 


133.17* 




5 


32.58* 


2 


91.00* 


7 


46.42* 


2 


115.58* 




7 


17.25* 


7 


68.58* 










n = 36 










V 


a=.01 


V 


a=.05 




V 


a=.01 


V 


a=.05 


7 


127.75 


7 


197.25 




4 


49.33 


5 


105.58 


5 


125.17 


6 


194.00 




3 


47.67 


6 


104.17 


6 


122.50* 


8 


190.25* 




5 


45.83 


4 


102.58 


4 


118.58* 


5 


187.92* 




6 


44.58 


7 


97.50* 


8 


113.42* 


4 


175.83* 




7 


41.33* 


3 


94.58* 


3 


94.92* 


3 


152.83* 




8 


36.83* 


8 


92.75* 










n = 48 










V 


a=.01 


V 


a=.05 




V 


a=.01 


V 


a=.05 


8 


126.42 


8 


184.83 




6 


42.83 


6 


92.75 


6 


121.42* 


9 


182.75 




5 


41.75 


5 


90.17 


9 


120.92* 


6 


175.58* 




7 


37.00* 


7 


88.17 


5 


117.83* 


7 


174.58* 




8 


36.08* 


8 


84.58* 


7 


116.25* 


10 


173.08* 




9 


36.00* 


9 


82.83* 


10 


104.67* 


5 


171.58* 




10 


28.42* 


10 


73.42* 



* Represents significant deviation from maximum, a=.05. 
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TABLE 4 

Empirical and A Priori Powers 
n = 48, a = .05 



Normal Population Skewed-Leptokurtic Population 

A Priori A Priori 

v df p 0 <J) Estimated Obtained <{> Estimated Obtained 



5 


2U 


.0201 

• \J £m V/ _I_ 


1. 386 


.52 


.443 


0.752 


.21 


.204 


5 


24 


.0388 


1.924 


.80 


.720 


1.044 


.30 


.354 


5 


24 


.0604 


2.401 


.94 


.896 


1.302 


.42 


.524 


6 


21 


.0201 


1.461 


.51 


.462 


0.781 


.21 


.217 


6 


21 


.0388 


2.029 


.82 


.729 


1.084 


.40 


.359 


6 


21 


.0604 


2.531 


.95 


.916 


1.353 


.48 


.537 


7 


15 


.0201 


1.386 


.50 


.448 


0.734 


.20 


.214 


7 


15 


.0388 


1.924 


.76 


.734 


1.018 


.29 


.340 


7 


15 


.0604 


2.401 


.91 


.913 


1.271 


.40 


.504 


8 


15 


.0201 


1.497 


.55 


.499 


0.786 


.19 


.184 


8 


15 


.0388 


2.079 


.80 


.781 


1.092 


.31 


.338 


8 


15 


.0604 


2.594 


.96 


.938 


1.362 


.48 


.493 


9 


12 


.0201 


1.461 


.51 


.491 


0.763 


.15 


.188 


9 


12 


.0388 


2.029 


.78 


.777 


1.059 


.27 


.339 


9 


12 


.0604 


2.531 


.94 


.925 


1.322 


.40 


.467 


10 


9 


.0201 


1.386 


.41 


.454 


0.721 


.15 


.177 


10 


9 


.0388 


1.924 


.70 


.720 


1.000 


.24 


.286 


10 


9 


. 0604 


2.401 


.89 


.903 


1.248 


.33 


.418 
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APPENDIX 



Computational Example of the 
Bartlett and Kendall Test 
n k =7,7,7, v k =3,3,3 



Data Matrix 



x il s il ln 


4i 


x i2 


2 
s i2 


ln S 2 2 


X i3 s i3 


ln S 2 3 


14 

8 9.0000 2.1972 
11 


10 
9 
17 


19.0000 


2.9444 


31 

10 120.3333 
15 


4.7903 


10 

14 2.9167 1.0704 

12 

11 


12 
12 
18 
17 


10.2499 


2.3272 


15 

36 94.2500 

24 

16 


4.5459 


si = 4.6190 






s| = 13.6190 


s§ = 92.0000 








Analysis of Variance 






o 

Y ik = In s ik entries 


for ANOVA 








Tjl *2 




T 3 










2.1972 2.9444 
1.0704 2.3272 


4. 
4. 


7903 
5459 




EE Y 
EE Y 2 


= 17.8756 
= 66.6719 
= 10.4157 




1.6338 2.6358 


4. 


6681 


Y k _ 


SS T0T 




5.1233 13.6358 


106.4952 


Antilog (Y k ) 


ss w 


- 0.8551 




9.5606 / 2 


4. 


7803 




ss 

BET 


- 9.5606 





F - = 

.8551 / 3 .2850 



16.7708 with df = 2,3 
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